Tuesday, the 31th of January, 2023
Regarding the analysis of longitudinal RCT data there is a debate going on whether an adjustment for the baseline value of the outcome variable should be made. When an adjustment is made, there is a lot of misunderstanding regarding the way this should be done. Therefore, the aims to:
to explain different methods used to estimate treatment effects in RCTs
to illustrate the different methods with a real data example
to give an advise on how to analyse RCT data
Treatment of Lead Exposed Children Trial (N=100)
| id | trt | w_0 | w_1 | w_4 | w_6 |
|---|---|---|---|---|---|
| 1 | Placebo | 31.9 | 26.9 | 25.8 | 23.8 |
| 2 | Active | 26.5 | 14.8 | 19.5 | 21.0 |
| 3 | Active | 25.8 | 23.0 | 19.1 | 23.2 |
| 4 | Placebo | 26.5 | 24.5 | 22.0 | 22.5 |
| 5 | Active | 20.4 | 2.8 | 3.2 | 9.4 |
| 99 | Active | 21.9 | 7.6 | 10.8 | 13.0 |
| 100 | Active | 20.7 | 8.1 | 25.7 | 12.3 |
| id | trt | week | lead |
|---|---|---|---|
| 1 | Placebo | 0 | 31.9 |
| 1 | Placebo | 1 | 26.9 |
| 1 | Placebo | 4 | 25.8 |
| 1 | Placebo | 6 | 23.8 |
| 2 | Active | 0 | 26.5 |
| 100 | Active | 4 | 25.7 |
| 100 | Active | 6 | 12.3 |
Treatment of Lead Exposed Children Trial (N=100)
tbl_summary(data = tlc_w,
by = trt,
include = starts_with('w'),
digit = list(everything() ~ c(1,2)),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
label = list( w_0 ~ 'Week 0',
w_1 ~ 'Week 1',
w_4 ~ 'Week 4',
w_6 ~ 'Week 6')) %>%
add_difference(test = list(all_continuous() ~ "t.test"),
pvalue_fun = ~style_pvalue(.x, digits = 2),
estimate_fun = list(all_continuous() ~ function(x) style_number(x, digits=1)) ) %>%
bold_labels() %>%
modify_column_merge(pattern = "{estimate} ({ci})") %>%
modify_header(label = '',
estimate = '**Difference (95% CI)**',
all_stat_cols(FALSE) ~ "**{level}**<br>n = {n}" ) %>%
modify_spanning_header(all_stat_cols(FALSE) ~ "**Treatment**") %>%
# modify_footnote(everything() ~ NA) %>%
add_n()| N | Treatment | Difference (95% CI)2 | p-value2 | ||
|---|---|---|---|---|---|
| Active n = 501 |
Placebo n = 501 |
||||
| Week 0 | 100 | 26.5 (5.02) | 28.0 (4.98) | -1.5 (-3.4, 0.5) | 0.15 |
| Week 1 | 100 | 13.5 (7.67) | 24.7 (5.46) | -11.1 (-13.8, -8.5) | <0.001 |
| Week 4 | 100 | 15.5 (7.85) | 24.1 (5.75) | -8.6 (-11.3, -5.8) | <0.001 |
| Week 6 | 100 | 20.8 (9.25) | 23.6 (5.64) | -2.9 (-5.9, 0.2) | 0.063 |
| 1 Mean (SD) | |||||
| 2 Welch Two Sample t-test | |||||
Treatment of Lead Exposed Children Trial with Missing Data (N=100)
tbl_summary(data = tlcmiss_w ,
by = trt,
missing = 'no',
include = starts_with('w'),
digit = list(everything() ~ c(1,2)),
statistic = list(all_continuous() ~ "{mean} ({sd})"),
label = list( w_0 ~ 'Week 0',
w_1 ~ 'Week 1',
w_4 ~ 'Week 4',
w_6 ~ 'Week 6')) %>%
add_difference(test = list(all_continuous() ~ "t.test"),
pvalue_fun = ~style_pvalue(.x, digits = 2),
estimate_fun = list(all_continuous() ~ function(x) style_number(x, digits=1)) ) %>%
bold_labels() %>%
modify_column_merge(pattern = "{estimate} ({ci})") %>%
modify_header(label = '',
estimate = '**Difference (95% CI)**',
all_stat_cols(FALSE) ~ "**{level}**<br>n = {n}" ) %>%
modify_spanning_header(all_stat_cols(FALSE) ~ "**Treatment**") %>%
# modify_footnote(everything() ~ NA) %>%
add_n() | N | Treatment | Difference (95% CI)2 | p-value2 | ||
|---|---|---|---|---|---|
| Active n = 501 |
Placebo n = 501 |
||||
| Week 0 | 100 | 26.5 (5.02) | 28.0 (4.98) | -1.5 (-3.4, 0.5) | 0.15 |
| Week 1 | 88 | 13.4 (8.11) | 24.5 (5.46) | -11.1 (-14.1, -8.1) | <0.001 |
| Week 4 | 81 | 15.2 (7.08) | 24.1 (5.80) | -8.9 (-11.8, -5.9) | <0.001 |
| Week 6 | 74 | 19.3 (10.48) | 23.7 (5.69) | -4.4 (-8.9, 0.1) | 0.055 |
| 1 Mean (SD) | |||||
| 2 Welch Two Sample t-test | |||||
| Characteristic | Overall | Active | Placebo |
|---|---|---|---|
| Week 0 | |||
| N | 100 | 50 | 50 |
| Mean (SD) | 27.3 (5.03) | 26.5 (5.02) | 28.0 (4.98) |
| Median (IQR) | 26.6 (23.0, 30.7) | 26.2 (22.1, 29.6) | 27.1 (23.6, 31.8) |
| Range | 19.7, 41.1 | 19.7, 41.1 | 20.7, 40.7 |
| N missing | 0 | 0 | 0 |
| Week 1 | |||
| N | 88 | 41 | 47 |
| Mean (SD) | 19.3 (8.78) | 13.4 (8.11) | 24.5 (5.46) |
| Median (IQR) | 20.9 (12.9, 25.2) | 11.6 (6.5, 17.5) | 23.9 (20.9, 27.8) |
| Range | 2.8, 40.8 | 2.8, 39.0 | 14.9, 40.8 |
| N missing | 12 | 9 | 3 |
| Week 4 | |||
| N | 81 | 35 | 46 |
| Mean (SD) | 20.2 (7.73) | 15.2 (7.08) | 24.1 (5.80) |
| Median (IQR) | 21.0 (15.9, 25.1) | 15.1 (9.1, 20.4) | 22.5 (19.8, 27.4) |
| Range | 3.0, 38.6 | 3.0, 27.0 | 15.3, 38.6 |
| N missing | 19 | 15 | 4 |
| Week 6 | |||
| N | 74 | 26 | 48 |
| Mean (SD) | 22.2 (7.93) | 19.3 (10.48) | 23.7 (5.69) |
| Median (IQR) | 21.2 (18.4, 24.8) | 18.5 (13.5, 22.1) | 22.4 (20.1, 27.6) |
| Range | 4.1, 63.9 | 4.1, 63.9 | 13.5, 43.3 |
| N missing | 26 | 24 | 2 |
ggplot(data = tlc_l %>%
mutate(week = as.character(week) %>% as.numeric()),
aes(x = week, y = lead, color = trt)) +
geom_quasirandom(dodge.width=1, alpha = 0.5, size = 2.5) +
scale_colour_brewer(palette = "Set1",
name = 'Treatment') +
scale_y_continuous(name = 'Lead (micrograms/dL)',
limits = c(0, 65)) +
scale_x_continuous(name = 'Week',
breaks = c(0, 1, 4, 6),
labels = c('0\nRandomization', '1', '4', '6')) +
theme_bw(base_size = 20) +
theme(panel.grid.minor = element_blank(),
legend.position = c(0.02, 0.98),
legend.justification = c('left','top'))m2 <- mmrm(lead ~ trt * week + us(week | id), data = tlc_l)
e2 <- emmeans(m2, pairwise ~ trt | week) %>%
as.data.frame() %>%
mutate(week = as.numeric(week))
f_a <- ggplot(data = tlc_l %>%
mutate(week = as.character(week) %>% as.numeric()),
aes(x = week, y = lead, color = trt) ) +
geom_quasirandom(dodge.width=1, alpha = 0.2, size = 2.5, show.legend = FALSE) +
geom_boxplot(aes(group = interaction(week, trt)),
fill = 'transparent',
outlier.shape = NA,
show.legend = FALSE) +
scale_colour_brewer(palette = "Set1",
name = 'Treatment') +
scale_y_continuous(name = 'Lead (micrograms/dL)',
limits = c(0, 65)) +
scale_x_continuous(name = NULL,
breaks = c(0, 1, 4, 6),
labels = c('0\nRandomization', '1', '4', '6')) +
stat_summary(fun = mean,
geom = "point",
size = 3,
position = position_dodge(width = 1)) +
stat_summary(fun = mean,
geom = "line",
size = 1,
position = position_dodge(width = 1),
show.legend = FALSE) +
theme_bw(base_size = 20) +
theme(panel.grid.minor = element_blank(),
legend.position = c(0.02, 0.98),
legend.justification = c('left','top'))
f_b <- ggplot(data = e2 %>% filter(contrast != '.'),
aes(x = week, y = emmean, group = contrast)) +
geom_hline(yintercept = 0, color = 'gray50') +
geom_point(size = 3) +
geom_line(size = 1) +
geom_linerange(aes(ymin = lower.CL,
ymax = upper.CL),
size = 1) +
geom_text(aes(y = 4.75,
label = gtsummary::style_pvalue(p.value, digits = 2, prepend_p = TRUE)),
size = 4) +
scale_color_discrete(name = 'Treatment') +
scale_y_continuous(name = 'Difference',
limits = c(-15, 5)) +
scale_x_continuous(name = 'Week',
expand = c(0.12,0.12),
breaks = c(0, 1, 4, 6),
labels = c('0\nRandomization', '1', '4', '6')) +
theme_bw(base_size = 20) +
theme(panel.grid.minor = element_blank())
f_a / f_b + plot_layout( heights = c(3, 1))ggplot(data = tlc_l %>%
mutate(week = as.character(week) %>% as.numeric()),
aes(x = week, y = lead, color = trt, group = id)) +
geom_quasirandom(dodge.width = 0.5, alpha = 0.5, size = 2.5) +
geom_line(alpha = 0.5,
position=position_quasirandom(dodge.width = 0.5)) +
facet_wrap(. ~ trt ) +
scale_colour_brewer(palette = "Set1",
name = 'Treatment') +
scale_y_continuous(name = 'Lead (micrograms/dL)',
limits = c(0, 65)) +
scale_x_continuous(name = 'Week',
breaks = c(0, 1, 4, 6),
labels = c('0\nRandomization', '1', '4', '6')) +
theme_bw(base_size = 18) +
theme(panel.grid.minor = element_blank(),
legend.position="none")\[Y_{t} = \beta_0 + \beta_1X + \beta_2Y_{t0}\]
mmrm fit
Formula: lead ~ trt + base + us(week | id)
Data: %>%, tlc3_l, filter(week != "0") (used 200 observations from 100
subjects with maximum 2 timepoints)
Covariance: unstructured (3 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1296.0 1303.8 -645.0 1290.0
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.73622 2.87810 97.00000 0.256 0.799
trtActive -4.97544 0.99763 97.00000 -4.987 2.68e-06 ***
base 0.82712 0.09974 97.00000 8.293 6.37e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
4 6
4 34.4910 10.2957
6 10.2957 43.8530
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.74 | 2.88 | 0.26 | 0.80 |
| trtActive | −4.98 | 1.00 | −4.99 | <0.001 |
| base | 0.83 | 0.10 | 8.29 | <0.001 |
\[Y_{t} = \beta_0 + \beta_1X + \beta_2Y_{t0} + \beta_3time + \beta_4X\times time\]
mmrm fit
Formula: lead ~ trt + base + week + week:trt + us(week | id)
Data: %>%, tlc3_l, filter(week != "0") (used 200 observations from 100
subjects with maximum 2 timepoints)
Covariance: unstructured (3 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1265.1 1272.9 -629.6 1259.1
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.91382 2.90529 100.46000 0.315 0.754
trtActive -7.35171 1.14443 97.99000 -6.424 4.80e-09 ***
base 0.82712 0.09974 97.00000 8.293 6.37e-13 ***
week6 -0.42400 0.94645 98.00000 -0.448 0.655
trtActive:week6 5.67200 1.33848 98.00000 4.238 5.11e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
4 6
4 32.2160 13.4516
6 13.4516 39.4752
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.91 | 2.91 | 0.31 | 0.75 |
| trtActive | −7.35 | 1.14 | −6.42 | <0.001 |
| base | 0.83 | 0.10 | 8.29 | <0.001 |
| week6 | −0.42 | 0.95 | −0.45 | 0.66 |
| trtActive:week6 | 5.67 | 1.34 | 4.24 | <0.001 |
\[Y_{t} = \beta_0 + \beta_1X + \beta_2time + \beta_3time\times X\]
mmrm fit
Formula: lead ~ trt + follow + follow:trt + us(week | id)
Data: %>%, tlc3_l, mutate(follow = as.numeric(follow) - 1) (used 300
observations from 100 subjects with maximum 3 timepoints)
Covariance: unstructured (6 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1900.0 1915.7 -944.0 1888.0
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 28.019 0.705 98.000 39.743 < 2e-16 ***
trtActive -1.763 0.997 98.000 -1.768 0.0801 .
follow -4.108 0.705 98.000 -5.827 7.21e-08 ***
trtActive:follow -4.671 0.997 98.000 -4.685 9.03e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
0 4 6
0 25.0204 19.3844 22.5123
4 19.3844 49.1906 27.5832
6 22.5123 27.5832 63.7270
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 28.02 | 0.70 | 39.74 | <0.001 |
| trtActive | −1.76 | 1.00 | −1.77 | 0.080 |
| follow | −4.11 | 0.70 | −5.83 | <0.001 |
| trtActive:follow | −4.67 | 1.00 | −4.68 | <0.001 |
\[Y_{t} = \beta_0 + \beta_1X + \beta_2dummy\_time_1 + \beta_3dummy\_time_2 +\\ \beta_4dumm\_time_1\times X + \beta_5dummy\_time_2 \times X\]
mmrm fit
Formula: lead ~ trt + week + trt:week + us(week | id)
Data: tlc3_l (used 300 observations from 100 subjects with maximum 3
timepoints)
Covariance: unstructured (6 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1869.1 1884.8 -928.6 1857.1
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 27.9960 0.7069 98.0000 39.605 < 2e-16 ***
trtActive -1.4560 0.9997 98.0000 -1.456 0.148
week4 -3.9260 0.8132 98.0000 -4.828 5.09e-06 ***
week6 -4.3500 0.8887 98.0000 -4.895 3.87e-06 ***
trtActive:week4 -7.1000 1.1501 98.0000 -6.174 1.51e-08 ***
trtActive:week6 -1.4280 1.2567 98.0000 -1.136 0.259
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
0 4 6
0 24.9838 19.6480 22.0747
4 19.6480 47.3781 30.6207
6 22.0747 30.6207 58.6510
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 28.00 | 0.71 | 39.61 | <0.001 |
| trtActive | −1.46 | 1.00 | −1.46 | 0.15 |
| week4 | −3.93 | 0.81 | −4.83 | <0.001 |
| week6 | −4.35 | 0.89 | −4.90 | <0.001 |
| trtActive:week4 | −7.10 | 1.15 | −6.17 | <0.001 |
| trtActive:week6 | −1.43 | 1.26 | −1.14 | 0.26 |
\[Y_{t} = \beta_0 + \beta_1time + \beta_2time\times X\]
m2c <- mmrm(lead ~ follow + follow:trt +
us(week|id),
data = tlc3_l %>% mutate(follow = as.numeric(follow)-1))
e2f0 <- emmeans(m2c, ~ trt | follow, at = list(follow = 0)) %>% as.data.frame()
e2f1 <- emmeans(m2c,revpairwise ~ trt | follow, at = list(follow = 1)) %>% as.data.frame()
e2c <- bind_rows(e2f0 %>% mutate(follow = '0'),
e2f1)
summary(m2c)mmrm fit
Formula: lead ~ follow + follow:trt + us(week | id)
Data: %>%, tlc3_l, mutate(follow = as.numeric(follow) - 1) (used 300
observations from 100 subjects with maximum 3 timepoints)
Covariance: unstructured (6 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1904.5 1920.1 -946.2 1892.5
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 27.2535 0.5026 99.0000 54.221 < 2e-16 ***
follow -3.9753 0.6998 99.4800 -5.681 1.34e-07 ***
follow:trtActive -4.9754 0.9820 98.0000 -5.067 1.91e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
0 4 6
0 25.2664 20.7527 21.1006
4 20.7527 51.2871 27.3801
6 21.1006 27.3801 61.2242
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 27.25 | 0.50 | 54.22 | <0.001 |
| follow | −3.98 | 0.70 | −5.68 | <0.001 |
| follow:trtActive | −4.98 | 0.98 | −5.07 | <0.001 |
\[Y_{t} = \beta_0 + \beta_1dummytime_1 + \beta_2dummytime_2 + \\ \beta_3dummytime_1 \times X + \beta_4dummytime_2 \times X\]
mmrm fit
Formula: lead ~ I(week == "4") + I(week == "6") + I(week == "4" & trt ==
"Active") + I(week == "6" & trt == "Active") + us(week | id)
Data: tlc3_l (used 300 observations from 100 subjects with maximum 3
timepoints)
Covariance: unstructured (6 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1873.1 1888.7 -930.5 1861.1
Coefficients:
Estimate Std. Error df t value
(Intercept) 27.2680 0.5027 99.0000 54.247
I(week == "4")TRUE -3.7705 0.8063 99.7200 -4.677
I(week == "6")TRUE -4.2652 0.8868 98.4200 -4.810
I(week == "4" & trt == "Active")TRUE -7.4110 1.1301 98.0000 -6.558
I(week == "6" & trt == "Active")TRUE -1.5975 1.2513 98.0000 -1.277
Pr(>|t|)
(Intercept) < 2e-16 ***
I(week == "4")TRUE 9.17e-06 ***
I(week == "6")TRUE 5.44e-06 ***
I(week == "4" & trt == "Active")TRUE 2.58e-09 ***
I(week == "6" & trt == "Active")TRUE 0.205
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
0 4 6
0 25.2668 19.8706 22.3248
4 19.8706 47.5531 30.8174
6 22.3248 30.8174 58.8718
| lead | I(week == "4") | I(week == "6") | I(week == "4" & trt == "Active") | I(week == "6" & trt == "Active") | id | week | |
|---|---|---|---|---|---|---|---|
| 1 | 31.9 | FALSE | FALSE | FALSE | FALSE | 1 | 0 |
| 2 | 25.8 | TRUE | FALSE | FALSE | FALSE | 1 | 4 |
| 3 | 23.8 | FALSE | TRUE | FALSE | FALSE | 1 | 6 |
| 4 | 26.5 | FALSE | FALSE | FALSE | FALSE | 2 | 0 |
| 5 | 19.5 | TRUE | FALSE | TRUE | FALSE | 2 | 4 |
| 6..299 | |||||||
| 300 | 12.3 | FALSE | TRUE | FALSE | TRUE | 100 | 6 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 27.27 | 0.50 | 54.25 | <0.001 |
| I(week == "4")TRUE | −3.77 | 0.81 | −4.68 | <0.001 |
| I(week == "6")TRUE | −4.27 | 0.89 | −4.81 | <0.001 |
| I(week == "4" & trt == "Active")TRUE | −7.41 | 1.13 | −6.56 | <0.001 |
| I(week == "6" & trt == "Active")TRUE | −1.60 | 1.25 | −1.28 | 0.20 |
data tlc3_l;
set tlc3_l;
if week = "4" then week_4 = 1; else week_4 = 0;
if week = "6" then week_6 = 1; else week_6 = 0;
run;
proc glimmix data=tlc3_l noclprint = 10;
class id week (ref = first) trt (ref = last);
model lead = week_4 week_6 week_4*trt week_6*trt / solution ddfm=satterthwaite;
random _residual_ / subject=id type=un;
run;\[Y_{t} - Y_{t0} = \beta_0 + \beta_1X\]
mmrm fit
Formula: diff ~ trt + us(week | id)
Data: %>%, tlc3_l, filter(week != "0") (used 200 observations from 100
subjects with maximum 2 timepoints)
Covariance: unstructured (3 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1296.2 1304.0 -645.1 1290.2
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -4.108 0.705 98.000 -5.827 7.21e-08 ***
trtActive -4.671 0.997 98.000 -4.685 9.03e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
4 6
4 35.4435 10.7073
6 10.7073 43.7238
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | −4.11 | 0.70 | −5.83 | <0.001 |
| trtActive | −4.67 | 1.00 | −4.68 | <0.001 |
\[Y_{t} - Y_{t0} = \beta_0 + \beta_1X + \beta_2 time + \beta_3X \times time\]
mmrm fit
Formula: diff ~ trt + week + trt:week + us(week | id)
Data: %>%, tlc3_l, filter(week != "0") (used 200 observations from 100
subjects with maximum 2 timepoints)
Covariance: unstructured (3 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1265.3 1273.1 -629.7 1259.3
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -3.9260 0.8132 98.0000 -4.828 5.09e-06 ***
trtActive -7.1000 1.1501 98.0000 -6.174 1.51e-08 ***
week6 -0.4240 0.9464 98.0000 -0.448 0.655
trtActive:week6 5.6720 1.3385 98.0000 4.238 5.11e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
4 6
4 33.0656 13.8818
6 13.8818 39.4858
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | −3.93 | 0.81 | −4.83 | <0.001 |
| trtActive | −7.10 | 1.15 | −6.17 | <0.001 |
| week6 | −0.42 | 0.95 | −0.45 | 0.66 |
| trtActive:week6 | 5.67 | 1.34 | 4.24 | <0.001 |
\[Y_{t} - Y_{t0} = \beta_0 + \beta_1X +\beta_2Y_{t0}\]
mmrm fit
Formula: diff ~ trt + base + us(week | id)
Data: %>%, tlc3_l, filter(week != "0") (used 200 observations from 100
subjects with maximum 2 timepoints)
Covariance: unstructured (3 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1296.0 1303.8 -645.0 1290.0
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.73622 2.87810 97.00000 0.256 0.7986
trtActive -4.97544 0.99763 97.00000 -4.987 2.68e-06 ***
base -0.17288 0.09974 97.00000 -1.733 0.0862 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
4 6
4 34.4910 10.2957
6 10.2957 43.8530
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.74 | 2.88 | 0.26 | 0.80 |
| trtActive | −4.98 | 1.00 | −4.99 | <0.001 |
| base | −0.17 | 0.10 | −1.73 | 0.086 |
\[Y_{t} - Y_{t0} = \beta_0 + \beta_1X + \beta_2 Y_{t0} + \beta_3X \times time\]
mmrm fit
Formula: diff ~ trt + base + week + trt:week + us(week | id)
Data: %>%, tlc3_l, filter(week != "0") (used 200 observations from 100
subjects with maximum 2 timepoints)
Covariance: unstructured (3 variance parameters)
Method: Satterthwaite
Inference: REML
Model selection criteria:
AIC BIC logLik deviance
1265.1 1272.9 -629.6 1259.1
Coefficients:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.91382 2.90529 100.46000 0.315 0.7538
trtActive -7.35171 1.14443 97.99000 -6.424 4.80e-09 ***
base -0.17288 0.09974 97.00000 -1.733 0.0862 .
week6 -0.42400 0.94645 98.00000 -0.448 0.6551
trtActive:week6 5.67200 1.33848 98.00000 4.238 5.11e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Covariance estimate:
4 6
4 32.2160 13.4516
6 13.4516 39.4752
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.91 | 2.91 | 0.31 | 0.75 |
| trtActive | −7.35 | 1.14 | −6.42 | <0.001 |
| base | −0.17 | 0.10 | −1.73 | 0.086 |
| week6 | −0.42 | 0.95 | −0.45 | 0.66 |
| trtActive:week6 | 5.67 | 1.34 | 4.24 | <0.001 |
| week | emmean | lower.CL | upper.CL | t.ratio | p.value | |
|---|---|---|---|---|---|---|
| (1a) Longitudinal analysis of covariance | 4/6 | −5.0 ± 1.00 | −7.0 | −3.0 | −4.99 | <0.001 |
| (2a) Repeated measures analysis | 4/6 | −6.4 ± 1.28 | −9.0 | −3.9 | −5.02 | <0.001 |
| (2c) Repeated measures wo/ treatment | 4/6 | −5.0 ± 0.98 | −6.9 | −3.0 | −5.07 | <0.001 |
| (3a) Analysis of changes (not adjusted) | 4/6 | −4.7 ± 1.00 | −6.6 | −2.7 | −4.68 | <0.001 |
| (3c) Analysis of changes (adjusted) | 4/6 | −5.0 ± 1.00 | −7.0 | −3.0 | −4.99 | <0.001 |
Longitudinal analysis of covariance |
Analysis of changes (adjusted) |
|---|---|
| \[Y_{t} = \beta_0 + \beta_1X + \beta_2Y_{t0}\] | \[Y_{t} - Y_{t0} = \beta_0 + \beta_1X +\beta_2Y_{t0}\] |
| \[Y_{t} = \beta_0 + \beta_1X +\beta_2Y_{t0} + Y_{t0}\] | |
| \[Y_{t} = \beta_0 + \beta_1X + (1 + \beta_2) Y_{t0}\] | |
| \[Y_{t} = 0.74 - 4.98 \times X + 0.83 \times Y_{t0}\] | \[Y_{t} = 0.74 - 4.98 \times X - 0.17 \times Y_{t0}\] |
GAM is a powerful and yet simple technique
Easy to interpret (Similar to GLM)
Flexible predictor functions can uncover hidden patterns in the data
Regularization of predictor functions helps avoid overfitting
\[y_i \sim \mathcal{N}(\mu_i, \sigma^2)\]
\[\mu_i = \beta_0 + \beta_1 \mathtt{month}_{i} + \beta_2 \mathtt{month}^2_{1i} + \cdots + \beta_j \mathtt{month}^j_{i}\]
Assumptions
An additive model address the first of these
\[\begin{align*} y_i &\sim \mathcal{D}(\mu_i, \theta) \\ \mathbb{E}(y_i) &= \mu_i = g(\eta_i)^{-1} \end{align*}\]
We model the mean of data as a sum of linear terms:
\[\eta_i = \beta_0 +\sum_j \color{red}{ \beta_j x_{ji}}\]
A GAM is a sum of smooth functions or smooths
\[\eta_i = \beta_0 + \sum_j \color{red}{f_j(x_{ji})}\]
m3a <- gamm(anyexac2 ~ s(month, bs = 'cc'),
data = nejm, verbosePQL = FALSE,
random = list(studyid = ~1),
family = binomial,
control = list(maxIter = 100),
method = 'REML')
m3 <- gamm(anyexac2 ~ group + s(month, bs = 'cc', by = group),
data = nejm, verbosePQL = FALSE,
random = list(studyid = ~1), f
amily = binomial,
control = list(maxIter = 100),
method = 'REML')s() terms are smooths of one or more variables
bs = 'cc' terms specifies a cyclic cubic regression spline
\[\eta_i = \beta_0 + f(\mathtt{month}_i)\]
| Month | N | Overall | Treatment | Difference1 | 95% CI1,2 | p-value1 | |
|---|---|---|---|---|---|---|---|
| Control | Intervention | ||||||
| Jan | 377 | 9.3% | 10.8% | 7.9% | 2.9% | -3.5%, 9.3% | 0.43 |
| Feb | 380 | 10.5% | 14.8% | 6.3% | 8.5% | 1.9%, 15% | 0.011 |
| Mar | 382 | 12.8% | 14.2% | 11.5% | 2.8% | -4.5%, 10% | 0.51 |
| Apr | 385 | 9.6% | 14.1% | 5.2% | 9.0% | 2.6%, 15% | 0.005 |
| May | 380 | 8.9% | 11.2% | 6.7% | 4.5% | -1.8%, 11% | 0.18 |
| Jun | 375 | 5.1% | 7.0% | 3.2% | 3.9% | -1.1%, 8.8% | 0.14 |
| Jul | 372 | 4.0% | 5.5% | 2.6% | 2.8% | -1.7%, 7.4% | 0.26 |
| Aug | 375 | 6.1% | 5.9% | 6.3% | -0.44% | -5.7%, 4.9% | >0.99 |
| Sep | 370 | 9.5% | 13.6% | 5.4% | 8.2% | 1.8%, 15% | 0.012 |
| Oct | 369 | 11.1% | 13.7% | 8.6% | 5.1% | -1.9%, 12% | 0.17 |
| Nov | 371 | 8.6% | 13.0% | 4.3% | 8.7% | 2.5%, 15% | 0.005 |
| Dec | 374 | 9.9% | 10.9% | 8.9% | 1.9% | -4.7%, 8.5% | 0.65 |
| 1 Two sample test for equality of proportions | |||||||
| 2 CI = Confidence Interval | |||||||
m3a <- gamm(anyexac2 ~ s(month, bs = 'cc'), data = nejm, verbosePQL = FALSE,
random = list(studyid = ~1), family = binomial, control = list(maxIter = 100), method = 'REML')
m3 <- gamm(anyexac2 ~ group + s(month, bs = 'cc', by = group), data = nejm, verbosePQL = FALSE,
random = list(studyid = ~1), family = binomial, control = list(maxIter = 100), method = 'REML')# DERIVATVES
f3a_d1 <- derivatives(m3a,
level = 0.68,
type = 'central',
interval = "simultaneous",
order = 1) %>%
draw() +
geom_pointless(aes(color = after_stat(location)),
location = c("min","max"),
size = 3,
color = 'gray25') +
geom_hline(yintercept = 0, color = 'gray50') +
scale_x_continuous(breaks = seq(1, 12, 1),
labels = c("J","F","M","A","M","J","J","A","S","O","N","D"),
expand = c(0.01, 0.01),
minor_breaks = FALSE) +
scale_y_continuous(limits = c(-0.8, 0.8) ) +
labs(x = "Month",
y = "1st Derivative (Rate of Change)",
title = NULL) +
theme_bw(base_size = 20) +
theme(legend.position = "none",
panel.grid.minor = element_blank())
f3a_d2 <- derivatives(m3a,
level = 0.68,
type = 'central',
interval = "simultaneous",
order = 2) %>%
draw() +
geom_pointless(aes(color = after_stat(location)),
location = c("min","max"),
size = 3,
color = 'gray25') +
geom_hline(yintercept = 0, color = 'gray50') +
scale_x_continuous(breaks = seq(1, 12, 1),
labels = c("J","F","M","A","M","J","J","A","S","O","N","D"),
expand = c(0.01, 0.01),
minor_breaks = FALSE) +
scale_y_continuous(limits = c(-1.2, 1.2) ) +
labs(x = "Month",
y = "2nd Derivative (Inflection Point)",
title = NULL) +
theme_bw(base_size = 20) +
theme(legend.position = "none",
panel.grid.minor = element_blank())
# DERIVATVES
d3d1 <- derivatives(m3,
level = 0.68,
type = 'central',
interval = "simultaneous",
order = 1) %>%
mutate(group = factor(smooth, labels = c("Control","Intervention")))
f3_d1 <- ggplot(data = d3d1,
aes(x = data, y = derivative, color = group, label = group)) +
geom_pointless(aes(color = after_stat(location)),
location = c("min","max"),
color = 'gray25',
size = 3) +
facet_wrap(~ group) +
geom_line(linewidth = 1) +
geom_ribbon(aes( ymin = lower, ymax = upper,
color = NULL, fill = smooth, alpha = 0.5))+
scale_x_continuous(breaks = seq(1, 12, 1),
labels = c("J","F","M","A","M","J","J","A","S","O","N","D"),
expand = c(0.01, 0.01),
minor_breaks = FALSE) +
scale_y_continuous(limits = c(-0.8, 0.8) ) +
scale_color_brewer(type = 'qual', palette = 'Dark2') +
scale_fill_brewer(type = 'qual', palette = 'Dark2') +
labs(x = "Month",
y = "1st Derivative (Rate of Change",
title = ) +
theme_bw(base_size = 20) +
theme(legend.position = "none",
panel.grid.minor = element_blank())
d3d2 <- derivatives(m3,
level = 0.68,
type = 'central',
interval = "simultaneous",
order = 2) %>%
mutate(group = factor(smooth, labels = c("Control","Intervention")))
f3_d2 <- ggplot(data = d3d2,
aes(x = data, y = derivative, color = group, label = group)) +
geom_pointless(aes(color = after_stat(location)),
location = c("min","max"),
color = 'gray25',
size = 3) +
facet_wrap(~ group) +
geom_line(linewidth = 1) +
geom_ribbon(aes( ymin = lower, ymax = upper,
color = NULL, fill = smooth, alpha = 0.5))+
scale_x_continuous(breaks = seq(1, 12, 1),
labels = c("J","F","M","A","M","J","J","A","S","O","N","D"),
expand = c(0.01, 0.01),
minor_breaks = FALSE) +
scale_y_continuous(limits = c(-1.2, 1.2) ) +
scale_color_brewer(type = 'qual', palette = 'Dark2') +
scale_fill_brewer(type = 'qual', palette = 'Dark2') +
labs(x = "Month",
y = "2nd Derivative (Inflection Point)",
title = ) +
theme_bw(base_size = 20) +
theme(legend.position = "none",
panel.grid.minor = element_blank())
design <- "
144
255
366
"
f3a +
f3a_d1 +
f3a_d2 +
(f3+facet_wrap(~group) + theme(strip.background = element_blank(),strip.text.x = element_blank())) +
(f3_d1 + theme(strip.background = element_blank(),strip.text.x = element_blank())) +
(f3_d2 + theme(strip.background = element_blank(),strip.text.x = element_blank())) +
plot_layout(design = design) d2s1 <- smooth_samples(m3$gam,
n = 500,
data = d1,
seed = 123) %>%
rename(month = .x1) %>%
group_by(draw, month) %>%
summarise(diff = value[1]-value[2]) %>%
mutate(diff = (diff + 0.35)/10 )
f3_s1 <- ggplot(data = d2s1,
aes(x = month, y = diff, group = draw, label = 'Difference')) +
geom_line(linewidth = 0.25, alpha = 0.25) +
geom_hline(yintercept = 0, color = 'gray50') +
scale_y_continuous(limits = c(-0.05, 0.12),
minor_breaks = FALSE,
expand = c(0, 0.01)) +
scale_x_continuous(breaks = seq(1, 12, 1),
labels = c("J","F","M","A","M","J","J","A","S","O","N","D"),
expand = c(0.01, 0.01),
minor_breaks = FALSE) +
labs(x = "Month",
y = "Difference between smooths",
title = ) +
theme_bw(base_size = 20) +
theme(legend.position = "none",
panel.grid.minor = element_blank())
s1b <- difference_smooths(m3$gam,
smooth = "s(month)",
ci_level = 0.95) %>%
mutate(across(c(diff, lower, upper), ~(.x + 0.35)/10))
f3_s2 <- ggplot(data = s1b,
aes(x = month, y = diff, color = (lower>0) ) ) +
geom_hline(yintercept = 0, color = 'gray50') +
geom_line(aes(group = 1), linewidth = 2) +
geom_ribbon(data = s1b,
inherit.aes = FALSE,
aes(x = month, ymin = lower, ymax = upper, alpha = 0.5))+
scale_y_continuous(limits = c(-0.05, 0.12),
minor_breaks = FALSE,
expand = c(0, 0.01)) +
scale_x_continuous(breaks = seq(1, 12, 1),
labels = c("J","F","M","A","M","J","J","A","S","O","N","D"),
expand = c(0.01, 0.01),
minor_breaks = FALSE) +
labs(x = "Month",
y = "Difference between smooths",
title = NULL) +
theme_bw(base_size = 20) +
theme(legend.position = "none",
panel.grid.minor = element_blank())
f3/f3_s1/f3_s2
## Patient Profiler (Proof of Concept)
| R environment | |
| Setting | Value |
|---|---|
| version | R version 4.2.0 (2022-04-22 ucrt) |
| os | Windows 10 x64 (build 17763) |
| system | x86_64, mingw32 |
| ui | RTerm |
| language | (EN) |
| collate | English_United States.1252 |
| ctype | English_United States.1252 |
| tz | America/Los_Angeles |
| date | 2023-01-31 |
| pandoc | 2.19.2 @ C:/R/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) |
| R Packages | ||
| Package | Loaded version | Date |
|---|---|---|
| broom | 1.0.3 | 2023-01-25 |
| dplyr | 1.0.10 | 2022-09-01 |
| emmeans | 1.8.4-1 | 2023-01-17 |
| forcats | 0.5.2 | 2022-08-19 |
| geomtextpath | 0.1.1 | 2022-08-30 |
| ggbeeswarm | 0.7.1 | 2022-12-16 |
| ggplot2 | 3.4.0 | 2022-11-04 |
| ggpointless | 0.0.3 | 2022-08-25 |
| gratia | 0.7.3 | 2022-05-09 |
| gt | 0.8.0 | 2022-11-16 |
| gtreg | 0.2.0 | 2022-10-18 |
| gtsummary | 1.7.0 | 2023-01-13 |
| labelled | 2.10.0 | 2022-09-14 |
| MASS | 7.3-58.2 | 2023-01-23 |
| mgcv | 1.8-41 | 2022-10-21 |
| mmrm | 0.2.2 | 2022-12-20 |
| multcomp | 1.4-20 | 2022-08-07 |
| mvtnorm | 1.1-3 | 2021-10-08 |
| nlme | 3.1-161 | 2022-12-15 |
| patchwork | 1.1.2 | 2022-08-19 |
| purrr | 1.0.1 | 2023-01-10 |
| reactable | 0.4.3 | 2023-01-07 |
| reactablefmtr | 2.1.0 | 2022-06-27 |
| readr | 2.1.3 | 2022-10-01 |
| rio | 0.5.29 | 2021-11-22 |
| scatterPlotMatrix | 0.2.0 | 2022-04-11 |
| stringr | 1.5.0 | 2022-12-02 |
| survival | 3.5-0 | 2023-01-09 |
| TH.data | 1.1-1 | 2022-04-26 |
| tibble | 3.1.8 | 2022-07-22 |
| tidyr | 1.3.0 | 2023-01-24 |
| tidyverse | 1.3.2 | 2022-07-18 |
| trelliscopejs | 0.2.6 | 2021-02-01 |